Search CORE

190 research outputs found

Hierarchical multi-stream posterior based speech secognition system

Author: H. Bourlard
H. Hermansky
L. Mangu
L.R. Rabiner
S. Dupont
Publication venue
Publication date: 01/01/2006
Field of study

Abstract. In this paper, we present initial results towards boosting posterior based speech recognition systems by estimating more informative posteriors using multiple streams of features and taking into account acoustic context (e.g., as available in the whole utterance), as well as possible prior information (such as topological constraints). These posteriors are estimated based on “state gamma posterior ” definition (typically used in standard HMMs training) extended to the case of multi-stream HMMs.This approach provides a new, principled, theoretical framework for hierarchical estimation/use of posteriors, multi-stream feature combination, and integrating appropriate context and prior knowledge in posterior estimates. In the present work, we used the resulting gamma posteriors as features for a standard HMM/GMM layer. On the OGI Digits database and on a reduced vocabulary version (1000 words) of the DARPA Conversational Telephone Speech-to-text (CTS) task, this resulted in significant performance improvement, compared to the stateof-the-art Tandem systems.

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

Crossref

Towards Robust and Adaptive Speech Recognition Models

Author: B Kingsbury
H Hermansky
H Mcgurk
J Allen
S Rao
T Houtgast
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

Crossref

Multiple Classifier Systems for the Classification of Audio-Visual Emotional States

Author: B. Schuller
B. Schölkopf
D.W. Robinson
E. Rolls
F. Schwenker
F. Schwenker
F. Zheng
H. Hermansky
H. Hermansky
H. Hermansky
J. Mutch
L. Breiman
L. Devillers
L. Kuncheva
L.R. Rabiner
M. Riesenhuber
M. Schmidt
P. Bayerl
P. Oudeyer
R. Cowie
S. Davis
S. Walter
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Abstract. Research activities in the field of human-computer inter-action increasingly addressed the aspect of integrating some type of emotional intelligence. Human emotions are expressed through differ-ent modalities such as speech, facial expressions, hand or body gestures, and therefore the classification of human emotions should be considered as a multimodal pattern recognition problem. The aim of our paper is to investigate multiple classifier systems utilizing audio and visual features to classify human emotional states. For that a variety of features have been derived. From the audio signal the fundamental frequency, LPC-and MFCC coefficients, and RASTA-PLP have been used. In addition to that two types of visual features have been computed, namely form and motion features of intermediate complexity. The numerical evaluation has been performed on the four emotional labels Arousal, Expectancy, Power, Valence as defined in the AVEC data set. As classifier architec-tures multiple classifier systems are applied, these have been proven to be accurate and robust against missing and noisy data.

CiteSeerX

Crossref

DWT and LPC based feature extraction methods for isolated word recognition

Author: AE Rosenberg
B Kotnik
DS Pallett
F Itakura
H Hermansky
H Hermansky
J Xu
JN Gowdy
K Wang
KP Soman
L Rabiner
M Gupta
M Krishnan
MJF Gales
Navnath S Nehe
NS Nehe
O Farooq
O Farooq
Raghunath S Holambe
S Mallat
SB Davis
SF Boll
Y Hao
Z Tufekci
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

New single-ended objective measure for non-intrusive speech quality evaluation

Author: A.W. Rix
Abdulhussain E. Mahdi
Dorel Picovici
H. Hermansky
J. Vesanto
J.G. Beerends
J.G. Beerends
J.L. Hall
K. Gopalan
L. Malfait
M.R. Schroeder
P. Gray
S. Voran
S. Wang
T.E. Quatieri
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 21/10/2014
Field of study

peer-reviewedThis article proposes a new output-based method for non-intrusive assessment of speech quality of voice communication systems and evaluates its performance. The method requires access to the processed (degraded) speech only, and is based on measuring perception-motivated objective auditory distances between the voiced parts of the output speech to appropriately matching references extracted from a pre-formulated codebook. The codebook is formed by optimally clustering a large number of parametric speech vectors extracted from a database of clean speech records. The auditory distances are then mapped into objective Mean Opinion listening quality scores. An efficient data-mining tool known as the self-organizing map (SOM) achieves the required clustering and mapping/reference matching processes. In order to obtain a perception-based, speaker-independent parametric representation of the speech, three domain transformation techniques have been investigated. The first technique is based on a perceptual linear prediction (PLP) model, the second utilises a bark spectrum (BS) analysis and the third utilises mel-frequency cepstrum coefficients (MFCC). Reported evaluation results show that the proposed method provides high correlation with subjective listening quality scores, yielding accuracy similar to that of the ITU-T P.563 while maintaining a relatively low computational complexity. Results also demonstrate that the method outperforms the PESQ in a number of distortion conditions, such as those of speech degraded by channel impairments.acceptedpeer-reviewe

University of Limerick Institutional Repository

Crossref

Design, development and field evaluation of a Spanish into sign language translation system

Author: A. García
D. Sánchez
DI Fels
E Efthimiou
F Casacuberta
F. Fernández
H Hermansky
J Och
J Wong
J. M. Montero
JB Mariño
JL Gauvain
L. F. D’Haro
R San-Segundo
R San-Segundo
R. Córdoba
R. San-Segundo
S Möller
V. López-Ludeña
V. Sama
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

This paper describes the design, development and field evaluation of a machine translation system from Spanish to Spanish Sign Language (LSE: Lengua de Signos Española). The developed system focuses on helping Deaf people when they want to renew their Driver’s License. The system is made up of a speech recognizer (for decoding the spoken utterance into a word sequence), a natural language translator (for converting a word sequence into a sequence of signs belonging to the sign language), and a 3D avatar animation module (for playing back the signs). For the natural language translator, three technological approaches have been implemented and evaluated: an example-based strategy, a rule-based translation method and a statistical translator. For the final version, the implemented language translator combines all the alternatives into a hierarchical structure. This paper includes a detailed description of the field evaluation. This evaluation was carried out in the Local Traffic Office in Toledo involving real government employees and Deaf people. The evaluation includes objective measurements from the system and subjective information from questionnaires. The paper details the main problems found and a discussion on how to solve them (some of them specific for LSE)

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

A bio-inspired feature extraction for robust speech recognition

Author: BCJ Moore
BR Glasberg
BS Atal
BS Atal
C Nadeu
DL Wang
H Beigi
H Hermansky
H Hirsch
J Garofolo
JP Martens
L Rabiner
LM Van Immerseel
M Unokia
R Meddis
RD Patterson
RF Lyon
S Bleeck
S Furui
S Young
SB Davis
T Irino
T Irino
Y Zouhir
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Biomimetic multi-resolution analysis for robust speaker recognition

Author: C Schreiner
D Garcia-Romero
D Garcia-Romero
D Zotkin
Dmitry N Zotkin
H Beigi
H Hermansky
H Hirsch
H Steeneken
H Versnel
J Woojay
JS Garofolo
K O’Connor
K Wang
L Miller
M Elhilali
Mounya Elhilali
P Kenny
P Loizou
Q Wu
R Auckenthaler
R Drullman
Ramani Duraiswami
S Greenberg
S Greenberg
Sridhar Krishna Nemala
T Arai
T Cover
T Elliott
T Kinnunen
X Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Microdevices for extensional rheometry of low viscosity elastic liquids : a review

Author: A Bazilevsky
AG Balducci
AG Banpurkar
AM Ardekani
B Berge
C Pipe
C Pipe
CG Hermansky
CJS Petrie
CJS Petrie
CW Macosko
DR Link
E Bänsch
ESG Shaqfeh
F Mugele
F. J. Galindo-Rosales
FR Phelan Jr
FT Trouton
G Beni
GG Fuller
GH McKinley
GH McKinley
GH McKinley
GI Taylor
GM Whitesides
H Münsted
HA Barnes
HCH Bandalusena
HP Babcock
J Husny
J Meissner
J Meissner
J Remmelgas
J Soulages
J Wang
JA Odell
JA Pathak
JE Matta
JH Song
JM Maia
JP Rothstein
JP Rothstein
JS Lee
K Niedzwiedz
K Niedzwiedz
K Nijenhuis
L Campo-Deaño
L Campo-Deaño
LE Rodd
LE Rodd
LE Rodd
M Padmanabhan
M Padmanabhan
M Roche
M Sentmanat
M Tanyeri
M Tanyeri
M. A. Alves
M. S. N. Oliveira
MA Alves
MG Pollack
MK Tan
MSN Oliveira
MSN Oliveira
MSN Oliveira
MSN Oliveira
MSN Oliveira
MSN Oliveira
MSN Oliveira
MSN Oliveira
N Kojic
N Kumari
P Becherer
P Dontula
P Erni
P Guillot
P Guillot
PC Sousa
PE Arratia
PE Arratia
PK Bhattacharjee
R Dylla-Spears
R Sattler
R Zheng
RB Bird
RI Tanner
RJ Poole
RR Lagnado
S Gaudet
S Ríos
SD Hudson
SJ Haward
SJ Haward
SJ Haward
SL Anna
SL Anna
SL Anna
SL Ng
SS Hsieh
T Cubaud
T Funami
T Schweizer
T Sridhar
TM Squires
TM Squires
TT Perkins
W Lee
WC Nelson
WW Schultz
YY Lin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Extensional flows and the underlying stability/instability mechanisms are of extreme relevance to the efficient operation of inkjet printing, coating processes and drug delivery systems, as well as for the generation of micro droplets. The development of an extensional rheometer to characterize the extensional properties of low viscosity fluids has therefore stimulated great interest of researchers, particularly in the last decade. Microfluidics has proven to be an extraordinary working platform and different configurations of potential extensional microrheometers have been proposed. In this review, we present an overview of several successful designs, together with a critical assessment of their capabilities and limitations

Crossref

University of Strathclyde Institutional Repository

Dereverberation and denoising based on generalized spectral subtraction by multi-channel LMS algorithm using a small-scale microphone array

Author: A Lee
BL Sim
C Avendano
C Avendano
C Raut
H Hermansky
H Maganti
K Itou
Kalle J Palomaki
L Wang
Q Jin
Q Jin
Raj Bhiksha
S Gannot
S Makino
S Subramaniam
T Nishiura
Y Huang
Y Huang
Y Huang
Y Huang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref